Skip to content

BUG: JoinUnit.is_na wrong for CategoricalDtype #37196

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Oct 20, 2020

Conversation

jbrockmendel
Copy link
Member

JoinUnit.is_na is basically checking isna(self.block.values).all(). The check for is_categorical is an attempted optimization bc values.categories is often much smaller than values. But Categorical represents its NAs in its codes, not in its categories. So this will incorrectly always return False in the status quo.

Having trouble coming up with a useful test. I can adapt a test from test_concat that returns an incorrect answer from is_na, but that does not appear to affect the result of the higher-level pd.concat call.

@jreback jreback added Categorical Categorical Data Type Dtype Conversions Unexpected or buggy dtype conversions Internals Related to non-user accessible pandas implementation labels Oct 17, 2020
@jreback
Copy link
Contributor

jreback commented Oct 17, 2020

surpised this doesn't break anything.

@jreback jreback added this to the 1.2 milestone Oct 17, 2020
@jreback jreback merged commit 951c9c1 into pandas-dev:master Oct 20, 2020
@jbrockmendel jbrockmendel deleted the bug-is_na branch October 20, 2020 01:48
JulianWgs pushed a commit to JulianWgs/pandas that referenced this pull request Oct 26, 2020
kesmit13 pushed a commit to kesmit13/pandas that referenced this pull request Nov 2, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Categorical Categorical Data Type Dtype Conversions Unexpected or buggy dtype conversions Internals Related to non-user accessible pandas implementation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

REF: Simplify JoinUnit.is_na for categorical
2 participants